Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix Tree Reduction on new instance type p3dn.24xlarge #13852

Merged
merged 12 commits into from
Jan 12, 2019

Conversation

ctcyang
Copy link
Contributor

@ctcyang ctcyang commented Jan 11, 2019

Description

Fixes tree reduction failing on new instance type p3dn.24xlarge, which was raised here: dmlc/gluon-nlp#520

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Adds a fallback for cudaDeviceGetP2PAttribute topology information to use cudaDeviceEnablePeerAccess when the former is inconsistent across instance types.
  • Fixes tree reduction breaking when cudaDeviceGetP2PAttribute provides wrong information.

Comments

  • On p3dn.24xlarge using CUDA 9.0, we observe that the cudaDeviceGetP2PAttribute topology looks different compared to when we use CUDA 9.0 with p3.16xlarge:
  // Check that all P2P connections are detected by GetP2PAttribute
  // If yes, then continue as before
  // If not, then fallback to using cudaDeviceEnablePeerAccess saved in p2p_matrix 
  //   (from EnableP2P() in src/kvstore/comm_tree.h)
  //
  // We have observed that with CUDA 9.0 p3.16xlarge:
  //
  //   0 2 2 3 3 1 1 1                . v v v v . . .
  //   2 0 3 2 1 3 1 1                v . v v . v . .
  //   2 3 0 3 1 1 2 1                v v . v . . v .
  //   3 2 3 0 1 1 1 2                v v v . . . . v
  //   3 1 1 1 0 2 2 3                v . . . . v v v
  //   1 3 1 1 2 0 3 2                . v . . v . v v
  //   1 1 2 1 2 3 0 3                . . v . v v . v
  //   1 1 1 2 3 2 3 0                . . . v v v v .
  //
  //        matrix                       p2p_matrix
  // cudaDeviceGetP2PAttribute   cudaDeviceEnablePeerAccess
  //
  // Here, they are correctly detected, because the 2s and 3s correspond to
  // links that have P2P connections between them. However for CUDA 9.0 p3dn.24xlarge:
  //
  //   0 2 2 1 1 1 1 1                . v v v v . . .
  //   2 0 1 2 1 1 1 1                v . v v . v . .
  //   2 1 0 1 1 1 2 1                v v . v . . v .
  //   1 2 1 0 1 1 1 2                v v v . . . . v
  //   1 1 1 1 0 2 2 1                v . . . . v v v
  //   1 1 1 1 2 0 1 2                . v . . v . v v
  //   1 1 2 1 2 1 0 1                . . v . v v . v
  //   1 1 1 2 1 2 1 0                . . . v v v v .
  //  
  //        matrix                      p2p_matrix
  // cudaDeviceGetP2PAttribute   cudaDeviceEnablePeerAccess
  //
  // The fastest connections (3 i.e. double NVLink) are not recognized as being a
  //   connection

@codecov-io
Copy link

Codecov Report

Merging #13852 into master will decrease coverage by 9.59%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #13852     +/-   ##
=========================================
- Coverage   81.64%   72.04%   -9.6%     
=========================================
  Files         721      717      -4     
  Lines       80158    81483   +1325     
  Branches     3233     3233             
=========================================
- Hits        65441    58704   -6737     
- Misses      13847    21898   +8051     
- Partials      870      881     +11
Impacted Files Coverage Δ
src/kvstore/comm_tree.h 0% <0%> (-95.89%) ⬇️
src/kvstore/gpu_topology.h 84.9% <0%> (-6.21%) ⬇️
src/operator/nn/mkldnn/mkldnn_concat-inl.h 0% <0%> (-100%) ⬇️
src/operator/subgraph/mkldnn/mkldnn_conv-inl.h 0% <0%> (-100%) ⬇️
src/operator/cudnn_spatial_transformer-inl.h 0% <0%> (-100%) ⬇️
src/operator/cudnn_bilinear_sampler-inl.h 0% <0%> (-100%) ⬇️
...erator/tensor/elemwise_binary_broadcast_op-inl.cuh 0% <0%> (-100%) ⬇️
...erator/quantization/mkldnn/mkldnn_requantize-inl.h 0% <0%> (-98.58%) ⬇️
src/operator/nn/mkldnn/mkldnn_lrn-inl.h 0% <0%> (-98.49%) ⬇️
src/operator/nn/mkldnn/mkldnn_concat.cc 1.56% <0%> (-98.44%) ⬇️
... and 386 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a6ed619...1ca7aab. Read the comment docs.

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Jan 11, 2019
@eric-haibin-lin eric-haibin-lin merged commit 0d76675 into apache:master Jan 12, 2019
eric-haibin-lin pushed a commit to eric-haibin-lin/mxnet that referenced this pull request Jan 16, 2019
* add fallback for gpu topology detection using CUDA 9.2

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* bring 3rdparty packages to upstream/master

* rebase to master

* Update gpu_topology.h
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* add fallback for gpu topology detection using CUDA 9.2

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* add fallback for gpu topology detection using CUDA 9.2

* add log

* update 3rdparty to master

* bring 3rdparty packages to upstream/master

* rebase to master

* Update gpu_topology.h
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants